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Abstract 

We consider multi-terminal source coding with a single encoder and multiple decoders where either the encoder 
or the decoders can take cost constrained actions which affect the quality of the side information present at the 
decoders. For the scenario where decoders take actions, we characterize the rate-cost trade-off region for lossless 
source coding, and give an achievability scheme for lossy source coding for two decoders which is optimum for 
a variety of special cases of interest. For the case where the encoder takes actions, we characterize the rate-cost 
trade-off for a class of lossless source coding scenarios with multiple decoders. Finally, we also consider extensions 
to other multi-terminal source coding settings with actions, and characterize the rate -distortion-cost tradeoff for a 
case of successive refinement with actions. 

I. Introduction 

The problem of source coding with decoder side information (S.I.) was introduced in [1]. S.I. acts as an important 
resource in rate distortion problems, where it can significantly reduce the compression rate required. In classical 
shannon theory and in work building on [1], S.I. is assumed to be either always present or absent. However, in 
practical systems as we know, acquisition of S.I. is costly, the encoder or decoder has to expend resources to aquire 
side information. With this motivation, the framework for the problem of source coding with action-dependent side 
information (S.I.) was introduced in [2], where the authors considered the cases where the encoder or decoder are 
allowed to take actions (with cost constraints) that affect the quality or availability of the side information present 
at the decoders, and in some settings, the encoder. As noted in [2], one motivation for this setup is the case where 
the side information is obtained via a sensor through a sequence of noisy measurements of the source sequence. 
The sensor may have limited resources, such as acquisition time or power, in obtaining the side information. This 
is therefore modeled by the cost constraint on the action sequence to be taken at the decoder. Additional motivation 
for considering this framework is given in [2]. We also refer readers to recent work in [3], [4] for related Shannon 
theoretic scenarios invoking the action framework. 

In this paper, we extend the source coding with action framework to the case where there are multiple decoders, 
which can take actions that affect the quality or availability of S.I. at each decoder, or where the encoder takes 
actions that affect the quality or availability of S.I. at the decoders. As a motivation for this framework, consider the 
following problem: An encoder observes an i.i.d source sequence X n which it wishes to describe to two decoders 
via a common rate limited link of rate R. The decoders, in addition to observing the output of the common rate 
limited link, also have access to a common sensor which gives side information Y that is correlated with X. 
However, because of contention or resource constraints, when decoder 1 observes the side information, decoder 
2 cannot access the side information and vice versa. This problem is depicted in Figure 1. Even in the absence 
of cost constraints on the cost of switching to 1 or 2, this problem is interesting and non-trivial. How should the 
decoders share the side information and what is the optimum sequence of actions be conveyed and then taken by 
the decoder? 

By posing the above problem in the framework of source coding with action dependent side information, we 
solve it for the (near) lossless source coding case, a special case of lossy source coding with switching dependent 
side information, and give interpretations of the standard random binning and coding arguments when specialized 
to this switching problem. As one example for the implications of our findings, when Y = X, we show that the 
optimum rate required for lossless source coding in the above problem is H(X)/2 - clearly a lower bound on 
the required rate, but that it suffices for perfect reconstruction of the source simultaneously at both decoders is, 
at first glance, surprising. We devote a significant portion of this paper to the setting where the side information 
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Fig. 1: Lossless source coding with switching dependent side information. When the switch is at position 1, decoder 
1 observes the side information. When the switch is at position 2, decoder 2 observes the side information. 

at the decoders is obtained through a switch that determines which of the two decoders gets to observe the side 
information, and obtain a complete characterization of the fundamental performance limits in various scenarios 
involving such switching. The achieving schemes in these scenarios are interesting in their own right, and also 
provide insight into more general cases. 

The rest of the paper is organized as follows. In section II, we provide formal definitions and problem formulations 
for the cases considered. In section III, we first consider the setting of lossless source coding with decoders taking 
actions with cost constraints and give the optimum rate-cost trade-off region for this setting. Next, we consider the 
setting of lossy source coding decoders taking actions with cost constraints and give a general achievability scheme 
for this setup. We then specialize our achievability scheme to obtain the optimum rate-distortion and cost trade-off 
region for a number of special cases. In section V, we consider the setting where actions are taken by the encoder. 
The rate-cost-distortion tradeoff setting is open even for the single decoder case. Hence, we only consider a special 
case of lossless source coding for which we can characterize the rate-cost tradeoff. In section VI, we extend our 
setup to two other multiple users settings, including the case of successive refinement with actions. The paper is 
concluded in section VII. 

II. Problem Definition 

In this section, we give formal definitions for, and focus on, the case where there are two decoders. Generalization 
of the definitions to K decoders is straightforward, and, as we indicate in subsequent sections, some of our results 
hold in the K decoders setting. We follow the notation of [5]. We use A to denote the action random variable. The 
distortion measure between sequences is defined in the usual way. Let d : X x X — > [0, oo). Then, d(x n ,x n ) := 
— Y^i=i d{xi,Xi). The cost constraint is also defined in the usual fashion: let A(A n ) := i Y^7=i A-C^i)- Throughout 
this paper, sources (X n ,Y n ) are specified by the joint distribution p(x n ,y n ) = Y[i=iPx,Y(xi,Vi) (i.i.d.)- The 
decoders obtain side information through a discrete memoryless action channel Pyi,y^\x,a specified by conditional 
distribution y% \x n , a n ) = Yli = iPY 1 ,Y 2 \x,A(yii, V2i\^i,o,i), with decoder j obtaining side information YJ 1 for 
j 6 {1>2}. Extensions to more than two sources or more than two channel outputs for multiple decoders are 
straightforward. 

A. Source coding with actions taken at the decoders 

This setting for two decoders is shown in figure 2. A (n, 2 nR ) code for the above setting consists of one encoder 

/ : X n -»Me[l: 2 nR ], 

one joint action encoder at all decoders 

fA-Deo. : M G [1 : 2 nR ] -+ A n , 

and two decoders 

91 : 3? x [1 : 2 nR ] X[\ 

92 : y$ x [1 : 2 nR ] -> X?, 
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Fig. 2: Lossy source coding with actions at the decoders. 
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Fig. 3: Lossy source coding with actions at the encoder. 

Given a distortion-cost tuple (D 1 ,D 2 ,C), a rate R is said to be achievable if, for any e > and n sufficiently 
large, there exists (n, 2 nR ) code such that 



i=i 



dj (Xi , Xj 



<Dj + e, j=l,2, 



< C 



The rate-distortion-cost region, lZ(Di, D2, C), is defined as the infimum of all achievable rates. 

Causal reconstruction with action dependent side information: Some results in this paper involves the case of 
causal reconstruction. In the case of causal reconstruction, the decoder reconstructs Xi based only on the received 
message M and the side information up to time i. That is, 



9j,i ■■ y l j x [1 : 2 



■nil] 



A' 



3,n 



for j G {1,2} and i e [1 : n]. 

Remark 2.1: The case of the decoders taking separate actions A\ and A2 respectively is a special case of our 
setup since we can write A := (^1,^2). 

Remark 2.2: For the reconstruction mappings, we excluded the action sequence as an input since A n is a function 
of the other input M. In our (information) rate expressions, we will see the appearance of A in the expressions. 
As we will see in the next subsection, an advantage of this definition is that it carries over to the case when the 
encoder takes actions rather than the decoders. 
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B. Source coding with action taken at the encoder 

This setting is shown in figure 3. As the definitions and problem statement for this case are similar to the first 
setting, we will only mention the differences between the two settings. The main difference is that the encoder 
takes actions rather than the decoders. Therefore, in the definition of a code, we replace the case of a joint action 
encoder at the decoders with the encoder taking actions given by the function 

fA-Enc ■ X" — > A". 

As in the setting of actions taken at the decoder, here too we assume that the side information observed by the 
decoders is not available at the encoder. In subsequent sections we also describe the results pertaining to the case 
where side information is available at the encoder. 

Remark 2.3: Lossless source coding - Some of our results concern the case of lossless source coding. In the 
case of lossless source coding, the definitions are similar, except that the distortion constraints D\,D2 are replaced 
by the block probability of error constraint: P({Xf ^ X n } U {X% ^ X n }) < e. 



III. Lossless source coding with actions at the decoders 

In this section and the next, we consider the case of source coding with actions taken at the decoders. We first 
present results for the lossless source coding setting. While the lossless case can be taken to be a special case of 
lossy source coding, we present them separately, as we are able to obtain stronger results for more general scenarios 
in the lossless setting, and give several interesting examples that arise from this setup. The case of lossy source 
coding for two decoders is presented in section IV. 

For the lossless case, we first state the result for the general case of K decoders. Our result is stated in Theorem 1. 

Theorem 1: Let the action channel be given by the conditional distribution Py 1 ,y 2 ....,y k \x,a with decoder j 
observing the side information Yj. Then, the minimum rate required for lossless source coding with actions taken 
at the decoders and cost constraint C is given by 

R = min max {H(X\Yj,A)} + I(X; A), 
where min is taken over the distributions p{x)p(a\x)p(yi , y 2 , ■ • • , Uk\x, a) such that E A(A) < C. 



Achievability 

As the achievability techniques used are fairly standard (cf. [5]), we give only a sketch of achievability. 
Codebook Generation: 

• Generate 2 n ( / ( J<:; ' 4 ) +£ ) A n sequences according to II™=iP( a 0- 

• Bin the set of all X n sequences into 2 n ( max mi--m{ H W Y i >*)}+*) binSi B(m b ),m b 6 [1 : 2 n ( roax ^l 1 =- Ff ] i H ( x \ Y J> A )}+*)]. 
Encoding: 

• Given a source sequence x n , the encoder looks for an index Ma G [1 : 2 n ^ I ^ x ' ,A ' >+t ^] such that (x n , a"(M J 4)) G 
7~e . If there is none, it outputs an uniform random index from [1 : 2 n ^ I ^ X;A ^ +e ^]. If there is more than one 
such index, it selects an index uniformly at random from the set of feasible indices. From the covering lemma 
[5, Chapter 3], the probability of error for this step goes to as n —> oo since there are 2 n< - I< - X ' A ^ +e " 1 A n 
sequences. 

• The encoder also looks the index m b G [1 : 2 n(max ^u ^i-f^P^^RO] suc h that x n G B(m b ). 

• It then sends the indices m b and Ma to the decoders via the common link. This step requires a rate of 

R = max je[1:K] {H(X\Y V A)} + I{X; A) + 2e. 
Decoding: 

• The decoders take the joint action a n (MA) and obtain their side informations Yj for j G [1 : K], 

• Decoder j then looks for the unique X n sequence in bin B{m b ) such that (X n ,Y™ , a n (M a)) G 7e . 
An error is declared if there is none more than one x n sequence satisfying the decoding condition. The 
probability of error for this step goes to as n — J- oo from the strong law of large numbers and the fact that 

\B\ > 2 n( - ma - x ^li- K ]{H(X\Y :i ,A))} _ 
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Converse 

Given a (n, 2 nR , C) code, consider the rate constraint for decoder j. We have 

nR > H(M) 
= I(M;X n ) 

(a) 



I(A n ;X n )+I(M;X n \A n ) 

> I(A n ;X n ) + H(M\A n , YJ l ) - H(M\A n ,X n , YJ l ) 
H(X n ) - H(X n \A n ) + I(M;X n \A n 7 YJ l ) 



> H(X n ) - H(X n \A n ) + H(X n \A n ,Y 3 n ) - ne n 
= H(X n ) - H(X n \A n ) + H{X n \A n ) 

+ H(YJ l \X n , A n ) - H(Yp\A n ) - ne n 
(d) JL 

> H ( X i) + H(Y 3l \X t , Ai) - H(Yji\Ai) - ne n . 
»=i 

(a) follows from A n being a function of M; (6) follows from the Markov chain M — > (X n ,A n ) —5- YJ 1 ; (c) 
follows from the assumption of lossless source coding; (d) follows from conditioning reduces entropy and the fact 
that the action channel is a discrete memoryless channel (DMC). Define Q as the standard time sharing random 
variable. Observe that H{X Q \Q) = H(X Q ) = H(X), H(Y jQ \A Q , X Q ,Q) = H(Y jQ \A Q ,X Q ) = H{YAA,X) 
and H (Yjq\Aq : Q) < H(Yj\A). Hence, we can write the lower bound as 

nR>n{H{X)+H{YAX, A) - H(YAA) - e n ) 
= n(I{X-A) + H(X\Y j ,A)-e n ). 

Taking the intersection of all lower bounds for all K decoders then give us the rate expression given in the Theorem. 
Finally, the cost constraint on the action follows from C > E — Ym=i -M^i) = E A(A). 

We now specialize the result in Theorem 1 to the case of source coding with switching dependent side information 
mentioned in the introduction. We consider the more general setting involving K decoders. 

Corollary 1: Source coding with switching dependent side information and no cost constraints. Let (X, Y) be 
jointly distributed according to p(x,y). Let A = [1 : K] and Py 1 ,y 2 ,...,Yk\x,A be defined by Y } ■ — Y when A = j 
and e otherwise for j 6 [1 : K]. Let A(A) := for all «eA Then, the minimum rate is given by 

H(X\Y) + ^-I(X-Y). 

Proof: 

Proof of Corollary 1 amounts to an explicit characterization of the distribution of p(a\x) in Theorem 1. For each 
j € [1 : A], we have, from Theorem 1, 

R > H(X\Yj,A) + I(X; A) 

= H(X\Y)+I(X-Y)-I(X-Y 3 \A). (1) 



Consider now the sum 



K 

Y,I{X-YAA) ( =' J2p(a)I(X;Y\A = a) 

j=l aeA 

= H(Y\A) - H(Y\X,A) 



< H(Y) - H(Y\X) 

= I(X;Y). (2) 
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(a) follows from the fact that Yj ; = e for a ^ j and : = Y for a = j. (b) follows from the Markov Chain 
A-X-Y. 

Next, summing over the K lower bounds in (1), we obtain 

1 K 
R > -(KH(X\Y)+KI(X;Y) - £ Y-|A)) 

3=1 

>if(x|y) + /(X;y)-l/(X;y) 

= H(X\Y) + Zj^I(X;Y), 

where we used inequality (2) in the second last step. Finally, noting that this lower bound on the achievable rate 
can be obtained from Theorem 1 by setting ALX and p(a = j) = 1/K completes the proof of Corollary 1. ■ 

Remark 3.1: The action can be set to a fixed sequence independent of the source sequence. This is perhaps not 
surprising since there is no cost on the actions. 

Remark 3.2: For K = 2 and X = Y, which is the example given in the introduction, we have R = H(X)/2. 

Remark 3.3: For this class of channels, the achievability scheme in Theorem 1 has a simple and interesting 
"modulo-sum" interpretation. We present a sketch of an alternative scheme for this class of switching channels for 
K = 2. It is straightforward to extend the achievability scheme given below to K decoders. 

Alternative achievability scheme 

nil 

Split the X n sequence into 2 equal parts; X l ' and X™, 2+1 and select the fixed action sequence of letting 

nil 

decoder 1 observe Y 1 and decoder 2 observe Y™, 2+1 . Separately compress each part using standard random 
binning with side information to obtain Mi G [1 : 2 n{ - H{ - x \ Y ^ 2+ ^] and M 2 G [1 : 2 n{ - H{ - x \ Y ^ 2+ ^] corresponding 
to the first and second half respectively. Within each bin, with high probability, there are only 2 nI ( X[Y ^ 2 typical 
X n / 2 sequences and we represent each of them with an index Mj\ G [1 : 2 n ( I ( x ' Y )/ 2+e )], where j G {1,2}. Send 
out the indexes Mi and M2, which requires a rate of H{X\Y) + 2e. Next, send out the index Mu © M21 which 
requires a rate of I(X;Y)/2 + e. From Mi and side information Y"' 2 , decoder 1 can recover X™ /2 with high 
probability. Therefore, it can recover Mu with high probability. Hence, it can recover M21 from Mu © M21 and 
therefore, recover the X% +1 sequence. The same analysis holds for decoder 2 with the indices interchanged. 

Corollary 2 gives the characterization of the achievable rate for a general switching dependent side information 
setup with cost constraint on the actions for two decoders. 

Corollary 2: General switching depedent side information for 2 decoders. Define the action channel as follows: 
A G {0, 1, 2, 3}; A = 0, Y 1 = e, Y 2 = e; A = 1, Y x = Y, Y 2 = e; A = 2, Y ± = e, Y 2 = Y; and A = 3,Y 1 =Y,Y 2 = 
Y. Let A(^4 = j) = Cj for j G [0:3]. Then, the optimum rate-cost trade-off for this class of channel is given by 

R > I(X; A) + max {H(X\Y U A), H(X\Y 2 , A)} 

3 

= I(X; A) + Po H(X\A = 0) +Y l PjH{X\Y, A = j) 

3=1 

+ max{ Pl I(X; Y\A = l),p 2 I{X; Y\A = 2)}, 

for some p(a\x), where P{A = j} = pj, satisfying Y^ =0 PjCj < C. 

Remark 3.4: This setup again has a "modulo-sum interpretation" for the term max{pi/(X; Y\A = V),p 2 I{X; Y\A = 
2)} and the rate can also be achieved by extending the achievability scheme described in Corollary 1. The 
scheme involves partitioning the X n sequence according to the value of A4 for % G [1 : n). Following the 
scheme in Corollary 1, we let Mj G [1 : 2 n ^ H( - x ^ A =^ +e '>} for j G [0 : 3]. We first generate a set of A n 
codewords according to n^iM *)- Next, for each A n codeword, define A nj to be {A; : A, = j}. Similarly, let 
X nj := {Xi : Ai = j,i G [1 : n]} be the set of possible X sequences corresponding to A nj . We bin the set of 
all X nj sequences to 2 n ^ PiH ^ Y ' A= ^ + ^ bins, Bj(Mj). For j G {1,2}, further bin the set of x nj sequences into 

2 n{ Pj I{X;Y\A=j)+e) ^ Bj^Mjx), M jX G [1 I 2 n ^ X ^ l A =-?)+ e )]. 

For encoding, given an x n sequence, the encoder first finds an A n sequence that is jointly typical with x n . 
It sends out the index corresponding to the A n sequence found. Next, it splits the x n sequence into four partial 
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sequences, x nj , for j £ [0 : 3], where x nj is the set of x% corresponding to At = j. It then finds the corresponding 
bin indices such that x nj 6 Bj(Ij) for j £ [0:3]. It then sends out the indices M , M\,M 2 , M 3 and M u © M 2 i. 

For decoding, we mention only the scheme employed by the first decoder, since the scheme is the same in for 
decoder 2. From the properties of jointly typical sequences and standard analysis for Slepian-Wolf lossless source 
coding [6], it is not difficult to see that decoder 1 can recover x n ° , x ni , x na with high probability. Recovery of x ni 
also allows decoder 1 to recover the index Mu and hence, M21 from Mu © M21. Noting that the rate of M 2 i and 
M2 sums up to p2H(X\A = 2) + 2e, it is then easy to see that decoder 1 can recover x™ 2 with high probability. 

In corollary 1, we showed that, for the case of switching dependent side information, the action sequence is 
independent of the source X n when cost constraint on the actions is absent. A natural question to ask is whether 
the action is still independent of X n when a cost constraint on the actions is present? The following example shows 
that the optimum action sequence is in general dependent on X n . 

Example 1: Action is dependent on source statistics when cost constraint is present. Let K = 2 and (X, Y) be 
distributed according to an S channel, with X ~ Bcrn(l/2), P(Y = l\X = 1) = 1 and P(Y = Q\X = 0) = 0.2. 
Let A e {1, 2} with Y x = Y if A = 1 and Y 2 = Y if A = 2. Let P(A = 1) = p u P(X = 0\A = 1) = 1/2 + Si 
and P(X = 0\A = 2) = 1/2 — S 2 - Figure 4 shows the probability distributions between the random variables. 




A X Y 

Fig. 4: Probability distributions for random variables used in example 1 

Since X ~ Bcrn(l/2), di and cfe are related by 62 = PiSi/(l—pi). Therefore, we set Si = S and k = pi/(l— pi) 
for this example. 

Now, let A(A = 1) = 1 and A(A = 2) = and C = 0.4. The optimum rate-cost tradeoff in this case may be 
obtained from Corollary 2 by setting Co = C3 = 00, C x = 1 and C2 = 0, giving us 

R = I(X;A) + Pl H(X\Y,A= 1) + (1 - pi)H{X\Y, A = 2) 
+ max{piI(X;Y\A = 1), (1 - pi)I(X; Y\A = 2)}, 

for some p{a\x), where P{^4 = 1} = p x , satisfying pi < 0.4. The problem of finding the optimum action sequence 
to take then reduces (after some straightforward algebra) to the following optimization problem: 

min 1 -piH 2 (0.5 - 5) - (1 - Pl )H 2 {Q.b - kS) 

+ Pl H(X\Y,A = 1) + (l-pi)H(X\Y,A = 2) 
+ max{ Pl /(I; Y\A = 1), (1 - Pi)I(X; Y\A = 2)}, 
subject to 

<Pi < 0.4, 
-0.5 < S < 0.5, 
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where 



H(X\A = l)= Pl H 2 (0.5-S), 
H{X\A = 2) = (1 - pi)JT 2 (0.5 - kS), 

H(X\Y, A = 1) = ((0.5 + S )m + 0.5 - S)H 2 ( ^-^^_ ) , 

W, A = 2) = ((0.5 - «*)(0.8) + (0.5 + «5))H 2 ( p - ^ - ^ ) , 

and i^O) is the binary entropy function. 

While exact solution to this (non-convex) optimization problem involves searching over p! and <5, it is easy to see 
that if A is restricted to be independent of X, which corresponds to restricting 8 to be equal to 0, then the optimum 
solution for pi is 0.4. Under pi = 0.4 and 5 = 0, we obtain Ra±x = 0.9568. In contrast, setting pi = 0.4 and 
6 = —0.05, we obtain R = 0.9554, which shows that the optimum action sequence is in general dependent on the 
source X when cost constraints are present. 

An explanation for this observation is as follows. The cost constraint forces decoder 1 to see less of the side 
information Y than decoder 2. It may therefore make sense to bias the distribution X\A = 1 so that Y conveys more 
information about the source sequence X, even at the expense of describing the action sequence to the decoders. 
Roughly speaking, the amount of information conveyed about X by Y may be measured by I(X;Y). Note that 
under 6 = 0, 1(X;Y\A = 1) = 0.108, whereas under 5 = -0.05, 1(X;Y\A = 1) = 0.1116. A plot of the optimum 
rate versus cost tradeoff obtained by searching over a grid of pi and 6 is shown in Figure 5. The figure also shows 
the rate obtained if actions were forced to be independent of the source sequence. 

IV. Lossy source coding with action at the decoders 

In this section, we first consider the case when causal reconstruction is required, and give the general rate- 
distortion-cost region for K decoders. Next, we consider the case of lossy noncausal reconstruction for two decoders 
and give a general achievability scheme for this case. We then show that our achievability scheme is optimum for 
several special cases. Finally, we discuss some connections between our setting and the complementary delivery 
setting introduced in [7]. 

A. Causal reconstruction for K decoders 

Theorem 2: Causal lossy reconstruction for K decoders 

When the decoders are restricted to causal reconstruction [8], 1Z(Di,D 2 , ■ . ■ , Dk, C) is given by 

R = I(U;X) 

for some p(u\x), A = f(U) and reconstruction functions Xj for j 6 [1 : K] such that 

Edj(X, Xj(U,Yj)) < D 3 for j e [1 : K] 
E A (A) < C. 

The cardinality of U is upper bounded by \U\ < \X\ \A\ + K. 

Remark 4.1: Theorem 2 generalizes the corresponding result for one decoder in [2, Theorem 3]. 

Proof: As the achievability scheme is a straightforward extension of the scheme in [2, Theorem 3], we will 
omit the proof of achievability here. For the converse, given a code that satisfies the cost and distortion constraints, 
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Fig. 5: Rate versus cost constraint for the example 1. It is easy to show operationally that the optimum rate versus 
cost curve is convex in the cost constraint. When the cost constraint approaches zero, the rate approaches 1, since 
this case corresponds to decoder 1 not seeing any of the side information. When the cost constraint approaches 
0.5, the rate approaches the minimum rate without cost constraint. The red dashed line shows the rate that would 
be obtained if actions were forced to be independent of the source. As can be seen on graph, forcing actions to 
be independent of the source is in general not optimum when cost constraint is present. The optimum rate versus 
cost constraint plot appears to be linear over a range of cost constraints. It can be shown that if the cost constraint 
is below a threshold, then the optimum rate is a linear function of the cost constraint. However, the plot obtained 
via numerical simulation appears to be linear in the cost constraint over a wider range than what we obtained by 
analysis. Performing a more refined analysis to obtain a cost constraint threshold that matches the cost threshold 
obtained by simulation appears to be difficult, due to the nature of the optimization problem that is involved. 

we have 

nR > H(M) 
= I{X n ;M) 

n 

i=l 
n 

i=l 
n 

^ Y^{H{Xi) - H(Xi\M, V-\X-\Y{-\ Yt 1 )) 

i=l 
n 

>YtH(X t ) - H(Xi\Ui)), 

i=l 

where (a) follows from the fact that X n is a memoryless source; (b) follows from the fact that A 1 ^ 1 is a function 
of M; (c) follows from the fact that the action channel p(yi, 1/2, • • ■ , Vk\x, a) is a memoryless channel; and the last 
step follows from defining t/, = (M, , . . . , Y^- -1 ). Finally, defining Q to be a random variable uniform over 
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[1 : n], independent of all other random variables, U = (Uq, Q), X = Xq, A = Aq and Yj = Yjq for j S [1 : K] 
then gives the required lower bound on the minimum rate required. Further, we have A = f(U). It remains to 
verify that the cost and distortion constraints are satisfied. Verification of the cost constraint is straightforward. For 
the distortion constraint, we have for j € [1 : K] 

1 " 

i=l 

where we define x'j(U,Yj) := XjQ(M,YJ). This shows that the definition of the auxiliary random variable U 
satisfies the distortion constraints. Finally, the cardinality of U can be upper bounded by using the support lemma 
[9]. We require — 1 letters to preserve Px,A> which also preserves the cost constraint. In addition, we require 

K + 1 letters to preserve the rate and K distortion constraints. ■ 
We now turn to the case of noncausal reconstruction. For this setting, we give results only for the case of two 
decoders. 

B. Noncausal reconstruction for two decoders 

We first give a general achievability scheme for this setting. 

Theorem 3: An achievable scheme for the lossy source coding with actions at the decoders is given by 

R > I(X;A)+max{I(X;U\A,Y 1 ),I(X;U\A,Y 2 )} 
+ I(X; Vi\U, A, Yy) + I(X; V 2 \U, A, Y 2 ) 

for some p(x)p(a\x)p(u\a, x)p(vi\u 7 a, x)p(v 2 \u, a, x)p(yi,y 2 \x, a) and reconstruction functions i\ and x 2 satis- 
fying 

E dj (X, Xj {U, Vj , A, Yj)) < Dj for j = 1, 2, 
EA(A) < C. 

We provide a sketch of achievability in Appendix A since the techniques used are fairly straightforward. As an 
overview, the encoder first tells the decoders the action sequence to take. It then sends a common description of 
X n , U n , to both decoders. Based on the action sequence A n and the common description U n , the encoder sends 
Vi and V 2 to decoders 1 and 2 respectively. We do not require decoder 1 to decode V 2 , or for decoder 2 to 
decode V{ 1 . 

Theorem 3 is optimum for the following special cases. 

Proposition 1: Heegard-Berger-Kaspi [10], [11 ] Extension. Suppose the following Markov chain holds: (X, A) — 
(A, Yi) — (^4, Y 2 ). Then, the rate-distortion-cost trade-off region is given by 

R > I(X;A)+I(X;U\A,Y 2 ) 
+ I{X;V 1 \U,A,Y 1 ) 

for some p(x)p(a\x)p(u, v\ \x, a)p{yi\x, a)p{y 2 \yi, a) satisfying 

Edx{X,Xi{U,Vx,A,Y x )) <£>i, 
Ed 2 {X,X 2 (U,A,Y 2 )) <D 2 , 
EA(A) < C. 

The cardinality of the auxiliary random variables is upper bounded by \U\ < \X\\A\+2 and \Vi \ < \U\(\X\\A\ + 1). 
The achievability for this proposition follows from Theorem 3 by setting V 2 = and noting that since (X, A) — 
(A, Yi) — (A, Y 2 ), the terms in the max{.} function simplifies to I{X\ U\A, Y 2 ). We give a proof of converse as 
follows. 

Converse: Given a code that satisfies the constraints, 

nR > H(M) 
= H(M,A n ) 
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= H(A n ) +H{M\A n ) 

> H(A n ) - H{A n \X n ) + H(M\A n ,Y 2 n ) - H{M\Y 2 n ,A n ,X n ) 
= I{X n ;A n ) + I(X n ;M\A n ,Y 2 n ) 

= I{X n ;A n ) + I(X n ;M, Y?\A n , Y 2 n ) - I{X n - Y?\M, A n ,Y 2 n ) 

= I(X n ;A n ) + H(X n \A n ,Y 2 n ) - H(X n \M, F™, A n , Y 2 n ) - I(X n ; Y?\M, A n ,Y 2 n ) 

n 

= I{X n ;A n ) + H(X n \A n , Y 2 n ) - ^2(H(Xi\M, Yf, A n , Y 2 n , X^ 1 ) + I(X n ; Y U \M, A n ,Y 2 n , Y^ 1 )) 

i=l 
n 

> I(X n ;A n ) + H(X n \A n ,Y 2 n ) - ^2(H(X l \M, F™, A n , Y 2 n ) + I(X n ;Y u \M, A n , Y 2 n , Y^ 1 )) 

i=l 
n 

( = 5 1(X n - A n ) + H(X n \A n ,Y 2 n ) - ^2(H(Xi\M, Y" , A n ,Y 2 ) + J(JQ; Y U \M, A n , Y 2 n , Y^ 1 )) 

i=l 
n 

= I(X n ;A n ) + H(X n \A n ,Y 2 n ) - ^ H(Xi\M, Y;-\A n ,Y 2 n ) 

i=l 

n 

+ Y,(I(Xi\YuW, A n , Y 2 n ,Yr l ) - HXf, Y U \M, A n , Y 2 n , F/" 1 )) 

n 

= I(X n ;A n ) + H(X n \A n ,Y 2 n )-^2 H(XAM, Y*' 1 , A n , Y 2 n ) 

i=l 

n 

+ Y / (HX l ;Y 1 n l+1 \M,A n ,Y 2 n ,YI), 

i=l 

n 

= I(X n ;A n ) + H(X n \A n ,Y 2 n ) -J2 H ( X i\M, *i _1 > A n , Y 2 n ) 

i=i 

n 

+ Y,{I{Xi-^ i+1 \M, A n ,Y 2 n \\Y li> Yi- 1 ), 

i=l 

where (a) follows from the fact that X n \ l - (M, A n ,Y 2 , Y* -1 , Xi) - Y u and the last step follows from the Markov 
Chain assumption Xi — (A;, Yu) — (Ai, Y 2 A. Consider now, 

I(X n ; A n ) + H{X n \A n ,Y 2 n ) = I(X n ;A n ) + H(X n , Y 2 n \A n ) - H(Y 2 \A n ) 

= H(X n ) + H(Y 2 n \A n ,X n ) - H(Y 2 \A n ) 

n 

> ^2(H(Xi) + H(Y 2i \Xi, Ai) - H(Y 2i \Ai)). 

i=l 

Hence, 

n n 

nR > Y J {H{X i )+H{Y 2i \X i , A i ) - H(Y 2i \A l )) - £ H(X Z \M, Y*~\ A n , Y 2 n ) 

i=l i=l 
n 

+ J2(HXi;Y^ i+1 \M, A n ,Y 2 n \ Y^Yr 1 ). 

i=l 

Define now Q to be a random variable uniform over [1 : n], independent of all other random variables; X = Xq, 

Yi = Yiq, Y 2 = Y 2Q , A = A Q , Ui = {M,Yl-\A n \\Y 2 nV ), V t = F£ i+1> U - (U Q ,Q) and V = Vq. Then, we 
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have 



R > H{X) + H(Y 2 \X,A) - H(Y 2 \A,Q) - H(X\A,Y 2} U) 

+ I{X;V \A,Y 1} U) 
> H(X) + H{Y 2 \X,A) - H{Y 2 \A) - H{X\A,Y 2 ,U) 

+ I(X;V\A,Y 1 ,U) 
= I{X; A) + I(X; U\A, Y 2 ) + I{X; V\A, Y u U). 

It remains to verify that the definitions of U, V and A satisfy the distortion and cost constraints, which is 
straightforward. Prove of the cardinality bounds follows from standard techniques. ■ 

The next proposition extends our results for the case of switching dependent side information to the a class of 
lossy source coding with switching dependent side information. 

Proposition 2: Special case of switching dependent side information. Let Y\ = X, Y 2 = Y if A = 1 and 
Yi = Y, Y% = X if A = 2 and for all x, there exists xi and x 2 such that di(x, x\) = and d 2 (x, x 2 ) = 0. Then, 
the rate-distortion-cost trade-off region is given by 

R > I(X;A) +max{P(A = 2)I(X;U 1 \A = 2,Y), 
P(A=l)I(X;U 2 \A = l,Y)} 

for some p(x,y)p(a\x)p(ui\x,a = 2)p(u 2 \x, a = 1) satisfying 

P(A = 2)Ed 1 (X,X 1 (Y,U 1 )\A = 2) < D x , 
P(A = \)£d 2 (X,X 2 {Y, U 2 )\A = 1) < D a , 

EA(A) < C. 

The cardinality of the auxiliary random variables is upper bounded by \Ui\ < \X\ + 1 and \U 2 \ < \X\ + 1. 
Achievability follows from Theorem 3 by setting V\ = V2 = and U = U 2 if A = 1 and U = U\ if A = 2, We 
give the proof of converse as follows. 

Converse: Given a code that satisfies the cost and distortion constraints, consider the rate required for decoder 
1. We have 

nR > H{M) 
= H(M, A n ) 
= H(A n ) + H(M\A n ) 

> H(A n )-H(A n \X n ) + H{M\A n ,Y 1 n )-H(M\Y 1 n ,A n ,X n ) 
= I(X n - A n ) + I{X n - M\A n ,Y?) 

= I{X n -A n )+H{X n ,Y^\A n ) - H{Y{ l \A n )-H{X n \M, A n ,Y?) 
= H(X n ) + H(Y"\A n ,X n ) - H(Y{ l \A n ) - H(X n \M, A n ,Y™) 

n n 

> Y^(H(Xi) + HiXulXuAi) - H(y u \Ai)) - ^ H(Xi\M, A n , Y"). 

i=l i=l 

As before, we define Q to be an uniform random variable over [1 : n], independent of all other random variables. 
We then have 

R > H(X Q \Q) + H(Y 1Q \X Q ,A Q ,Q) - H(Y Q \A Q ,Q) - H(X Q \M,A n ,Y 1 n ,Q) 

(a) 

> H(X)+H(Y 1 \X,A) -H(Xi\A) - H(X\M, A n , F") 
( = ] I{X;A)+I(X;U 1 \Y 1 ,A). 

(a) follows from the discrete memoryless nature of the action channel and the fact that conditioning reduces entropy; 

(b) follows from defining Uu = (M, A n \ l , Y™ ) and U\ = (Uiq,Q). Expanding the second term in terms of A 
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and using the observation that Y\= X when A = 1 and Y\ = Y when A = 2, we obtain 

R > I(X;A) + P(A = 2)I{X; U\\Y, A = 2). 

For decoder 2, the same steps with side information Y 2 instead of Y\ and defining U 2 i = (M, A n \ l , Y 2 ^ % ), U 2 = 
(U 2Q ,Q) yield 

R > I(X; A) + P(A = 1)I(X; U 2 \Y, A = 1). 
Taking the maximum over two lower bounds yield 

R > I(X;A) +max{P(A = 2)I(X;U 1 \Y, A = 2),P(A = 1)J(X; U 2 \Y, A = 1)} 

for some p{a\x)p{u\,u 2 \x,a). Verifying the cost constraint is straightforward. As for the distortion constraint, we 
have for the decoder 1 

- Ed 1 {X n ,x^{M,A n 1 Y^)) = Edi(X,xi(C/i,A,Fi)) 
n 

= P(A = 2) E(di(X,£i(tfi,y))|j4 = 2). 

The same arguments hold for decoder 2. It remains to show that the probability distribution can be restricted to the 
form p{a\x)p( Ul \a,x)p(u 2 \a,x). Observe that P(A = 2) E(d 1 (X, £i(J7i, Y))\A = 2) and P(A = 2)I(X;U 1 \Y,A = 
2) depends on the joint distribution only through the marginal p(a,ui\x) and P(A = 1) E(d 2 (X,x 2 (U 2 ,Y))\A = 1) 
and P(A = 1)I(X; U 2 \Y, A = 1) depends on the joint distribution only through the marginal p(a, u 2 \x). Hence, 
restricting the joint distribution to the form p(a\x)p(ui\a, x)p(u 2 \a, x) does not affect the rate, cost or distortion 
constraints. It remains to bound the cardinality of the auxiliary random variables used, which follows from standard 
techniques. This completes the proof of converse. ■ 

Remark 4.2: The condition on the distortion constraints is simply to remove distortion offsets. It can be removed 
in a fairly straightforward manner. 

Remark 4.3: As with the lossless source coding with switching dependent side information case, a modulo sum 
interpretation for the terms in the max expression is possible. When A = 1, the encoder codes for decoder 2, 
resulting, after binning, in an index I2 for the codeword U 2 \ and when A = 2, the encoder codes for decoder 1, 
resulting, after binning, in an index I\ for the codeword U{\ The encoder sends out the modulo sum of the indices 
of the two codewords (I\ © I2) along with the index of the action codeword. Decoder 1 has the Xj sequence when 
A = 2 and hence, it has the index 12- Therefore, it can recover it's desired index I\ from I\ @I 2 . A similar analysis 
holds for decoder 2. 

Example 2: Binary source with Hamming distortion and no cost constraint. Let Y — and X ~ Bcrn(l/2). 
Assume no cost on the actions taken: A(A = 1) = A(A = 2) = and let the distortion measure be Hamming. 
Then, the rate distortion trade-off evaluates to 

R = minmax{o! (1 - H 2 (Di/a)) 1 {D x /a < 1/2) , 

a. 

(1 -a) {l- H 2 (£> 2 /(l - a))) 1 (D 2 /(l - a) < 1/2)} , 

where l(x) denotes the indicator function. As a check, note that if D\,D 2 — > 0, then the rate obtained is 1/2, which 
agrees with the rate obtained in Corollary 1 for the lossless case. The result follows from explicitly evaluating the 
result in Proposition 2. Let P(A = 2) = a. From Proposition 2, we have 

R > I(X; A) + P(A = 2)I{X;U 1 \Y,A = 2) 

= 1 - (1 - a)H{X\A = 1) - aH(X\A = 2) + aH(X\A = 2) - aH(X\U x ,A = 2) 
>a- aH(X\U u A= 2) 
>a(l-H(X®Xi\U u A = 2)) 

The last step follows from the observations that (i) if D\/a > 1/2, then we lower bound R by 0; and (ii) if 
Di/a < 1/2, then from the distortion constraint aEd(X, Xi\A = 2) < D u H(X © X X \A = 2) < H 2 (D 1 /a). 
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The other bound is derived in the same manner. The fact that this rate can be attained is straightforward, since we 
can choose Ux = X\ when A = 2 and U 2 = X 2 when A = 1. In this example, the action sequence is independent of 
the source, but unlike the case of lossless source coding, P{A = 1) is not in general equal to P(A = 2). It depends 
on the distortion constraints for the individual decoders. A surface plot of the rate versus distortion constraints for 
the two decoders is shown in Figure 6. 



0.5^1 
0.4- 



0.3 




0.5 



Fig. 6: Plot of rate versus distortions. The figure above plots the rate distortion surface R(D±, D 2 ) for the Example 
2. There is no side information, i.e., Y = and X <~ Bern(l/2). Assume no cost on the actions taken: A(A = 
1) = A(A = 2) = and let the distortion measure be Hamming. Note that if any of Di, D 2 —5- 0.5, R approaches 
0, also if Di = D 2 = 0, rate is 0.5 



C. Connections with Complementary Delivery 

In the prequel, we consider several cases for switching dependent side information in which the achievability 
scheme has a simple "modulo sum" interpretation for the terms in the max function. This interpretation is not 
unique to our setup and in this subsection, we consider the complementary delivery setting [7] in which this 
interpretation also arises. Formally, the complementary delivery problem is a special case of our setting and is 
obtained by letting A = 0, X = (X,Y), P(Y 1 ,Y 2 \X) = 1 Yi= x,y 2 =Y' A(A) = 0, di{X,X{) = d'^Y,^) and 
d 2 (X, X2) = d 2 (X, X 2 ), For this subsection, for notational convenience, we will use X in place of X, Y in place 
of Y, Y in place of X\ and X in place of X2. This setting is shown in Figure 7. In [7], the following achievable 
rate was established 

R(D ll D 2 ) > max{I(UiY\X),I(U;X\Y)}, (3) 

for some p(u\x,y) satisfying E d\{Y, Y(U, X)) < D x and Ed 2 {X, X(U,Y)) < D 2 . 

Our achievability scheme in Theorem 3 generalizes this scheme when specialized to the complementary delivery 
setting, but we do not yet know if our achievable rate can be strictly smaller for the same distortions. However, 
by taking a modulo sum interpretation for the terms in the max{.} function in (3), as we have done for several 
examples in this paper, we are able to give simple proofs and explicit characterization for two canonical cases: 
the Quadratic Gaussian and the doubly symmetric binary Hamming distortion complementary delivery problems. 
While characterizations for these two settings also appear independently in [12], our approach in characterizing 
these settings is different from that in [12], and we believe would be of interest to readers. Furthermore, by taking 
the "modulo sum" interpretation, we establish the following, which may be a useful observation in practice: "For 
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Fig. 7: Complementary Delivery setting 



the Quadratic Gaussian complementary delivery problem, if one has a good code ( in the sense of achieving the 
optimum rate distortion tradeoff) for the point to point Wyner-Ziv [1 ] Quadratic Gaussian setup, then a simple 
modification exists to turn the code into a good code for the Quadratic Gaussian complementary delivery problem.'" 
A similar observation holds for the doubly symmetric binary Hamming distortion case. We first consider the 
Quadratic Gaussian case. 

Proposition 3: Quadratic Gaussian complementary delivery. Let Y = X + Z, where Z ~ N(0, N) is independent 
of X ~ N(0, P), and the distortion measures be mean square distortion. Let P' = PN/(P+N). The rate distortion 
region for the non-trivial constraints of D 2 < P' and D\ < N is given by 



R(Di,D-2) = max 




Proof: 
Converse 

The converse follows from straightforward cutset bound arguments. The reader may notice that the expression 
given above is the maximum of the Quadratic Gaussian Wyner-Ziv [1] rate to decoder 1 and the Quadratic Gaussian 
Wyner-Ziv rate to decoder 2, or equivalently the maximum of the two cutset bounds. Clearly, this rate is the lowest 
possible for the given distortions. 
Achievability 

We now show that it is also achievable using a modulo sum interpretation for (3). Consider first encoding 
for decoder 1. From the Quadratic Gaussian Wyner-Ziv result, we know that side information at the encoder is 
redundant. Therefore, without loss of optimality, the encoder can code for decoder 1 using only Y n , resulting in 
the codeword Uy and the corresponding index Iy after binning. Similarly, for decoder 2, the encoder can code for 
decoder 2 using X n only, resulting in the codeword U' x and index Ix after binning. The encoder then sends out the 
index Ix ® Iy- Since decoder 1 has the X n sequence as side information, it knows the index Ix and can therefore 
recover Iy from Ix ©Iy - The same decoding scheme works as well for decoder 2. Therefore, we have shown the 
achievability of the given rate expression. We note further that this scheme corresponds to setting U = (Ux,Yy) 
such that Ux — X — Y — Uy in rate expression (3). ■ 

Remark 4.4: As shown in our proof of achievability, if we have a good practical code for the Wyner-Ziv Quadratic 
Gaussian problem, then we also have a good practical code for the complementary delivery problem setting. We 
first develop two point to point codes: one for the Wyner-Ziv Quadratic Gaussian case with X as the source and 
Y as the side information, and another for the case where Y is the source and X is the side information. A good 
code for the complementary delivery setting is then obtained by taking the modulo sum of the indices produced by 
these two point to point codes. 
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We now turn to the doubly symmetric binary sources with Hamming distortion case. Here, the achievability 
scheme involves taking the modulo sum of the sources X n and Y n . 

Proposition 4: Doubly symmetric binary source with Hamming distortion. Let X ~ Bern(l/2), Y ~ Bern(l/2), 
X © Y ~ Bern(p) and both distortion measures be Hamming distortion. Assume, without loss of generality, that 
Di,D 2 < p. Then, 

R(D U D 2 ) = max{H(p) - H{Dx), H{p) - H(D 2 )}. 

Proof: The converse again follows from straightforward cutset bounds by considering decoders 1 and 2 
individually. For the achievability scheme, let Z = X © Y and assume that D\ < D 2 . Since Z is i.i.d. Bern(p), 
using a point to point code for Z at distortion D±, we obtain a rate of H(p) — H(D\). Denote the reconstruction 
for Z at time i by Zj. Decoder 1 reconstructs Yi by Y\ = Xi © Z>i for % € [1 : n]. Similarly, decoder 2 
reconstructs X by X, = Yi © Zj for i G [1 : to]. To verify that the distortion constraint holds, note that 
d\(Yi,Xj, ffi Zi) = Yi © Xi © Zj = Zj © Zj. Since Z is a code that achieves distortion Di, Y satisfies the 
distortion constraint for decoder 1. The same analysis holds for decoder 2. ■ 
Remark 4.5: In this case, we only need a good code for the standard point to point rate distortion problem for a 
binary source. A good rate distortion code for a binary source is also a good code for the doubly symmetric binary 
source with Hamming distortion complementary delivery problem. 

Remark 4.6: In our scheme, the reconstruction symbols at time i depend only on the received message and the side 
information at the decoder at time i. Therefore, for this case, the rate distortion region for causal reconstruction [8] 
is the same as the rate distortion region for noncausal reconstruction. 



V. Actions taken at the encoder 

We now turn to the case where the encoder takes action (figure 3) instead of the decoders. When the actions are 
taken at the encoder, the general rate-cost-distortion tradeoff region is open even for the case of a single decoder. 
Special cases which have been characterized includes the lossless case [2]. In this section, we consider a special 
case of lossless source coding with K decoders in which we can characterize the rate-cost tradeoff region. 

Theorem 4: Special case of lossless source coding with actions taken at the encoder. Let the action channel be 
given by the conditional distribution f > y li Y' 2) ... 1 y R .|jf,A- Assume further that A = /i(Yi) = f 2 {Y 2 ) = ••• , /if(Yjf). 
Then, the minimum rate required for lossless source coding with actions taken at the encoder and cost constraint 
C is given by 

R = min[ max {H(X\Yj,A)} - H(A\X)} + , 

where minimization is over the joint distribution p(x)p(a\x)p(yi , yi, ■ ■ ■ , yK\x,a) such that EA(A) < C. 
Proof: 

Converse The proof of converse is a straightforward extension from the single decoder case given in [2]. We give 
the proof here for completeness. Consider the rate required for decoder j. 

nR > H(M) 

> H(M,X n \Yr i ) - H{X n \M,Y?) 

> H(M,X n \Y] l )-ne n 

( => H{X n \Y?) - ne n 

( = } H(X n ) + H(Y 3 n \X n ,A n ) - H{Y; 1 ) - ne n 

n 

> ^2(H(Xi) + H(Yji\Xi, Ai) - H(Y h i)) - ne n , 
i=l 

where (a) follows from the fact that M is a function of X n and (6) follows from A n being a function of X n . 
The last step follows from X n being a discrete memoryless source; the action channel being memoryless and 
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conditioning reduces entropy. As before, we define Q to be an uniform random variable over [1 : n] independent 
of all other random variables to obtain 

R > H(X) + H(YAX, A) - H(Yj) - e n 

= H{X) + H(Y jt X\A) - H(X\A) - H{Y 3 ) - e„ 
= H{X\A, Yj) + I(X; A) - I(Y 3 ;A) - e n 
= H{X\A,Yj) - H(A\X) - e„. 

The last step follows from the fact that A = fj(Yj). Combining the lower bounds over K decoders then give us 
the achievable rate stated in the Theorem. 

Achievability We give a sketch of achievability since the techniques used are relatively straightforward. Assume 
first that R > 0. We first bin the set of X n sequences to 2"( max ^n^] l x ^)+ £ ), B{M X ), M x £ [1 : 
2n(max je[1:K]H(Yj \x,A)+e)^ Qj ven an x n se q Uencei we first find the bin index m x such that x n £ B{m x ). We then 
split m x into two sub-messages: m xr £ [1 : 2 max ^n :k]{h(x\y 3 ,a)}-h{a\x)+2^ and m ^ g ^ . 2 n(H(A\x)-e)^ 
m xr is transmitted over the noiseless link, giving us the rate stated in the Theorem. As for m xa , we will send 
the message through the action channel by treating the action channel as a channel with i.i.d. state X noncausally 
known at the transmitter (^4). We can therefore use Gel'fand Pinsker coding [13] for this channel. 

Each decoder first decodes m xa from their side information Yj. From the condition that A = fj(Yj) for all 
j, we have H(A\X) — e = I(Yj; A) — I(X; A) — e. From analysis of Gel'fand-Pinsker coding, since l-M^I = 
I(Yj\A) — I(X;A) — e, the probability of error in decoding m xa goes to zero as n — > oo. The decoder then 
reconstructs m x from m xr and m xa . It then finds the unique x n £ B(m x ) that is jointly typical with YJ 1 and 
A n . Note that due to Gel'fand-Pinsker coding, the true x n sequence is jointly typical with YJ 1 and A n with 
high probability. Therefore, the probability of error in this decoding step goes to zero as n — > oo since we have 

2™( max jeIi:Jf] H(Y j \X,A)+e) fo[ ns 

For the case where R = 0, we send the entire message through the action channel. ■ 
Example 3: Consider the case of K = 2 with switching dependent side information: A = {1,2} and (X,Y) ~ 
p(x,y) with Pyx.y^x.a specified by Y\ = Y, Y% — e when A = 1 and Yi = e,>2 = Y when A = 2. Note 
that A is a function of Yi, and also of Y"2- It therefore satisfies the condition in Theorem 4. Let P(A = 1) = a, 
A(A = 1) = C\ and A(A = 2) = C^- The rate-cost tradeoff is characterized by 

R = max{aH(X\A = 1,Y) + (1 -a)H(X\A = 1), (1 - a)H(X\A = 2, Y) + aH(X\A = 2)} 
+ H{X) - H 2 (a) - aH{X\A = 1) - (1 - a)H(X\A = 2) 

for some p(a\x) satisfying aC\ + (1 — a)C2 < C. 

VI. Other settings 

In this section, we consider other settings involving multi-terminal source coding with action dependent side 
information. The first setting that we consider in this section generalizes [2, Theorem 7] to the case where there 
is a rate-limited link from the source encoder to the action encoder. The second setting we consider is a case of 
successive refinement with actions. 

A. Single decoder with Markov Form X-A-Y and rate limited link to action encoder 

In this subsection, we consider the setting illustrated in Figure 8. Here, we have a single decoder with actions 
taken at an action encoder. The source encoder have access to source X n and sends out two indices M € [1 : 2 nR ] 
and Ma £ [1 : 2 nRji ]. The action encoder is a function / : Ma —> A n . In addition, we have the Markov relation 
X — A — Y. That is, the side information Y is dictated only by the action A taken. The other definitions remain 
the same and we omit them here. 

Proposition 5: R(D, C) for the setting shown in figure 8 is given by 

R(D,C) = minmax{/(X;X) - R A ,I(X;X) - I(A; Y)}, 

where the minimization is over p(x)p(a)p(y\a)p(x\x) satisfying the cost and distortion constraints Ed(X, X) < D 
and EA(A)< C. 
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Fig. 8: Lossy source coding with rate limited link to action encoder 



Remark 6.1: If we set Ra = oo in Proposition 5, then we recover the result in [2, Theorem 7]. Essentially, the 
source encoder tries to send as much information as possible through the rate limited action link until the link 
saturates. 
Proof: 

Achiev ability: The achievability is straightforward. Using standard rate distortion coding, we cover X n with 
2n(i(X;X)+e) j^n coc [ eworc i s Given a source sequence x n , we find an X n that is jointly typical with x n . We then 
split the index Mx corresponding to the chosen X n codeword into two parts: Ma & [1 : 2 n ( mm { RA > I ( A > Y '} +e '] 
and M G [1 : 2 nR \. The action encoder takes the index and transmit it through the action channel. Since the rate 
of Ma is less than I(A; Y) — e, the decoder can decode Ma with high probability of success. It then combines 
Ma with M to obtain the index of the reconstruction codeword X n . 
Converse Given a code that satisfy the distortion and cost constraints, we have 

nR > H(M) 
= I{X n ;M) 

> I(X n ;M)- I(X n ;Y n ) 

= I(X n ;X n ) - I(X n , M A ; Y n ) 

(a) 



> I&iiXi) - !{X n , M A , A n ; Y n ) 



(6) 
> 



i=l 



I(X i ;X i )-I{M A ,A n ;Y n ). 



(a) follows from the fact that A n is a function of Ma- (b) follows from the Markov chain X — A — Y. Now, it 
is easy to see that I(Ma, A n ; Yi) < mm{nRA, Yn=i Yi)}- The bound on the rate is then single letterized in 
the usual manner, giving us 

R(D,C) = mmmax{I(X;X) - R A ,I(X;X) - I(A;Y)}, 

for some p(a,x\x) satisfying the distortion and cost constraints. Finally, we note that p(a,x\x) can be restricted to 
the form p(a)p(x\x). To see this, note that none of the terms depend on the joint p(a, x\x). Furthermore, due to the 
Markov conditon X — A — Y, it suffices to consider A independent of X, giving us the p.m.f in the Proposition. ■ 



B. Successive refinement with actions 

The next setup that we consider is a case of successive refinement [14], [15] with actions taken at the "more 
capable" decoder. The setting is shown in Figure 9. 
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Fig. 9: Successive refinement with actions 

Proposition 6: Successive refinement with actions taken at the more capable decoder For the setting shown in 
figure 9, the rate distortion cost tradeoff region is given by 



for some p(xx, a, u\x) satisfying 



Ri > /(JTjJfi), 
Ri+R-2> I(X;X 1 ,A)+I(X;U\X 1 ,Y,A) 

Ed x (X,Xi)<D lt 
Ed 1 (X,X 2 (U,Y,A))<D 2 , 
E A(A) < C. 

The cardinality of the auxiliary U may be upper bounded by \U\ < \X\\Xi\\A\ + 1. 

If we restrict R 2 = 0, then Proposition 6 gives the rate-distortion-cost tradeoff region for a special case of 
Proposition 1. That is, the case when Y 2 = and actions are taken only at decoder 1. 
Proof: 

Achievability: We give the case where R 1 = I(X; X x ) + e and R 2 = I(X; A\X X ) + I(X; U\X U Y, A) + 3e. The 
general region stated in the Proposition can then be obtained by rate splitting of R 2 . 



Codebook generation 

• Generate 2 nRl Xp(mi) sequences according to niLi P(^i»)> m i £ [1 : 2 nii ]. 

• For each Xf(mi) sequence, generate 2™( I ( x ' A l Xl )+ e ) A n (mi,m 2 i), sequences according to J|"=i p{a>i\xu)- 

• For each X"(mi) and A n (mi,m 2 ) sequence pair, generate 2™( / ( X;C/ l Xl ' j4 -) +c ) U n (mi,m 2 i,l 22 ), sequences 

according to n™=iP( u il^ii' a i)- 

• Partition the set of l 22 indices into 2 I ( x ' u \ it ^ Y ' A )+^ bins> B{mx, m 2 i, m 22 ), m 22 6 [1 : 2™( I ( x ; c/ l* 1 > Y > A )+ 2e )]. 



• Given a sequence x", the encoder first looks for an i"(mi) sequence such that {x n ,ii ) e Te n) - This step 



succeeds with high probability since R\ = I(X; Xi) + e. 
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• Next, the encoder looks for an A n (m Xi m2 X ) sequence such that (x n ,a n ,x x ) G aep. This step succeeds with 
high probability since we have 2"( 7 ( X;A l Xl ) +£ ) A n sequences. 

• The encoder then looks for an U n (m x , m,2i, I22) sequence such that (x n , a n , x x , u n ) G aep. This step succeeds 
with high probability since we have 2™( / ( X;C/ l J<:i '^)+ £ ) U n sequences. 

• It then finds the bin index such that I22 G B{m x , 77121,77122). 

• The encoder sends out the indices mi over the link R x and 777,21 and 77722 over the link R2, giving us the 
stated rates. 

Decoding and reconstruction 

• Since decoder 1 has index mi, it reconstructs x" using xi(m,i) n . Since (x n ,x") are jointly typical with high 
probability, the expected distortion satisfies the D x distortion constraint to within e. 

• For decoder 2, from mi and 77121, it recovers the action sequence a n (mi, 77121). It then takes the action 
a n (mi,m2i) to obtain it's side information Y n . With the side information, it recovers the u n sequence by 
looking for the unique I22 G B(mi, 77721, 77722) such that (u"(mi, 77721, 122), %i , o- n , Y") S % . Since there 
are only 2 n( - I( - u - Y ^ 1 ^-^ U n sequences in the bin and (u n (mi, m 2 i, Z22), a", F") £ 7^ n) with high 
probability from the fact that Y is generated i.i.d. according to p(y\ai,Xi), the probability of error goes to 
zero as 77 — y 00. Decoder 2 then reconstructs x n using X2i( a i, u%, Vt) for i G [1 : 77]. 

Converse: We consider only the lower bound for i?i + i?2- The lower bound for R x is straightforward. Given a 
code which satisfies the distortion and cost constraints, we have 

n(R 1 +R 2 ) > H(M X ,M 2 ) 

= H(Mi,M2,A n ,X?) 

= H(A n ,X?)+H{M x ,M 2 \A n ,X?) 

> I(X n - ) A n ,X^) + H(M 1 ,M 2 \A n ,X^,Y n )-H(M 1 ,M2\Y n ,A n ,X^,X n ) 
= I(X n ;A n ,X?) + I{X n ;M 1 ,M 2 \A n ,X?,Y n ) 

= I(X n ; A n , X?) + H (X n I A n , X™ , Y " ) - H {X n \ A n , X™ , Y " , M 1 , Af 2 ) 

= i"(X"; A", If) + H(X n , Y n \A n , X?) - H(Y n \X? 7 A n ) - H(X n \A n , Xf, Y n , M 1 ,M 2 ) 

= H(X n ) - H(X n \A n ,X?) + H{X n ,Y n \A n ,X?) - H(Y n \X?,A n ) - H(X n \A n ,X?,Y n ,M x ,M2) 

= H(X n ) + H(Y n \X n 7 A n 7 If) - H (F n |Xf , A n ) - H(X n \A n ,X? , Y n , M X ,M 2 ) 

71 

> Y,( H ( X i) +H(Y i \X n ,A n ,X?,Y i - 1 ) - H{Y i \X?,A n ,Y i - 1 )-H(X i \A n ,X?,Y n ,M 1 ,M 2 )) 

i=l 
(a) n 

> J2( H ( X i) + H ( Y i\ X i, A i> X u) - H{Yi\X u , Ai) - H(Xi\A n , If, Y n , M X ,M 2 )) 

i=l 
n 

= Y,( H ( x i) + H(Yi\Xi, A h X u ) - H(Yi\X u , A t ) - H(Xi\A n ,X?, Y n , M X ,M 2 )) 

i=l 
n 

> J2( H ( X i) + HOri\Xi,Ai,Xu) - H(Y\X U , Ai) - H(Xi\Ui, A h Y h X Xi )) 

i=l 

(a) follows from the Markov Chain (X 71 ^ , A n \ l , X™ V ,Y l ~ 1 ) - (X ll X i ,A l ) - F 4 and the last step follows from 
defining Ui = (Mi, M 2 , Y 71 ^ 1 , A n ^). The proof is then completed in the usual manner by defining the time sharing 
uniform random variable Q and U = (Uq,Q), giving us 

Ri+R 2 > H(X)+H(Y\X,A,X X ) -H(Y\X X ,A) - H(X\U,A,Y,X X ) 
= I(X;X X ,A)+I(X;U\X X ,Y,A). 
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The fact that X2 is a function of U, Y and A, which is straightforward. Finally, the cardinality bound on U may be 
obtained from standard techniques. Note that we need |^i||Af||^4| — 1 letters to preserve p(u,a,x) and two more 
to preserve the rate and distortion constraints. ■ 
Remark 6.2: An interesting question to explore characterizing the more general case when degraded side information 
is also available at decoder 1. That is, we have the side informations Y\ at decoder 1 and Y2 at decoder 2 are 
generated by a discrete memoryless channel Py!,y 2 \x.a such that (X, A) — (Y2, A) — (Yi, A). This generalized setup 
would allow us to generalize Proposition 1 entirely and also leads to a generalization of successive refinement for 
the Wyner-Ziv problem in [16] to the action setting. 

VII. Conclusion 

In this paper, we considered an important class of multi-terminal source coding problems, where the encoder 
sends the description of the source to the decoders, which then take cost-constrained actions that affect the quality 
or availability of side information. We computed the optimum rate region for lossless compression, while for the 
lossy case we provide a general achievability scheme that is shown to be optimal for a number of special cases, 
one of them being the generalization of Heegard-Berger-Kaspi setting, (cf. [10], [11]). In all these cases in addition 
to a standard achievability argument, we also provided a simple scheme which has a modulo sum interpretation. 
The problem where the encoder takes actions rather than the decoders, was also considered. Finally, we extended 
the scope to additional multi-terminal source coding problems such as successive refinement with actions. 
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Appendix A 

ACHIEVABILITY SKETCH FOR THEOREM 3 

Codebook generation 

• Generate 2"( / ( x ^)+ £ ) A n (l a ), l a G [1 : 2' l ( / ( X ' A )+ e )], sequences according to Ut=iP( a i)- 

• For each A n sequence, generate 2™( / (< 7 ^l' 4 )+ £ ) U n {l a ,l ), lo G [1 : 2"( 7 ( C7 ' X l A )+ e )], sequences according to 
nr=iP*7|AKk)- 

• Partition the set of indices corresponding to the U n codewords uniformly to 2"( max { / ( x ^l A < yi )> / ( x ; u l A - y2 )}+ 2e ) 
bins, Bu{l a ,m ), m G [1 : 2«( ma *U( A "^l^)< / (^l^)}+ 2e )]. 

• For each pair of A n and U n sequences, generate 2 n ^^' X \ A ^ + ^ V?(l a , l , h), h G [1 : 2™W v ' 1 > x l A ' C/ ) +e )], 
sequences according to n"=i PV4|A,c/(«ij l a i, u i)- 

• Partition the set of indices corresponding to the V" codewords uniformly to 2 n ( I ( X ' Vl \ U ' A ' Yl * >+2<i ' > bins, 
B Vl (l a ,l ,mi), mi G [1 : 2 n{i(x-y 1 \u,A,Y^)+2e)^ 

• For each pair of A" and U" sequences, generate 2 ra ( J ( v ^ x l A > c/ )+ e ) V 2 n (l a , lo, h), h G [1 : 2 ni - I{ - v ^ x \ A ^+% 
sequences according to Y\a=i Pv 2 \A,u( v 2i |o», u»). 

• Partition the set of indices corresponding to the V^" codewords uniformly to 2™( 7 ( X;V2 l c/ ' j4 ' Y2 ) +2e ) bins, 
Bv 2 (la,lo,m 2 ), m 2 G [1 : 2 »('< J ™I 

• Given an x" sequence, the encoder first looks for an a n (l a ) sequence such that (x n ,a n ) G % (n) . If there is 
none, it outputs and index chosen uniformly at random from the set of possible l a indices. If there is more 
than one, it outputs an index chosen uniformly at random from the set of feasible indices. Since there are 
2n{i{X;A)+e) suc jj se q UenceSi the probability of error — > as n — > oo. 

• The encoder then looks for a u n (l a ,lo) sequence that is jointly typical with (a n (l a ), x n ). If there is none, it 
outputs and index chosen uniformly at random from the set of possible Iq indices. If there is more than one, it 
outputs an index chosen uniformly at random from the set of feasible indices. Since there are 2 n ( / ( c/;X l" 4 ) +e ) 
such sequences, the probability of error — > as n — > oo. 

• Next, the encoder looks for a Vi(l a ,lo,li) sequence that is jointly typical with (a n (l a ),u n (lo),x n ). If there 
is none, it outputs and index chosen uniformly at random from the set of possible lo indices. If there is more 
than one, it outputs an index chosen uniformly at random from the set of feasible indices. Since there are 
2n(i(v 1 ;X\A,u)+e) sequences, the probability of error — > as n —> oo. 

• Next, the encoder looks for a V2(l a ,lo,h) sequence that is jointly typical with (a n (l a ),u n (lo),x n ). If there 
is none, it outputs and index chosen uniformly at random from the set of possible lo indices. If there is more 
than one, it outputs an index chosen uniformly at random from the set of feasible indices. Since there are 
2n(i(v 2 -,x\A,u)+e) suc j 1 se q UenceSi the probability of error — > as n —> oo. 

• The encoder then sends out the indices l a , mo, mi and m-i such that lo G Bu(l a ,mo), h G Bv 1 (la,lo>mi) 
and Z 2 G By 2 (l a , lo, m 2 ). 

Decoding and reconstruction 
Decoder 1: 

• Decoder 1 first takes the action sequence a n (l a ) to obtain the side information Y{ 1 . We note that if 
(a n (l a ),x n ,u n (l a Jo),Vi(la,lo,h)) G Te (n) , then P{(a n (i„), x n , u n (l a , l ), <(/ Q , l , h), Y?) G % {n) } -> 1 
as n —> oo by the conditional typicality lemma [5, Chapter 2] and the fact that Y" ~ Yl7=i P(yu\ x i, a i)- 

• Decoder 1 then decodes U n . it does this by finding the unique l such that u n (l a , lo) G S[/(Z a ,mo). If there 
is none or more than one such Iq, an error is declared. Following standard analysis for the Wyner-Ziv setup 
(see for e.g. [5, Chapter 12]), the probability of error goes to zero as n — > oo since there are less than or equal 
to 2 n ( / ( c/;1 ' 1 l j4 ) -e ) U n sequences within each bin. 

• Similarly, decoder 1 decodes V™. It does this by finding the unique l\ such that vf (l a , Iq, l\) G By x (l a , lo, mi). 
If there is none or more than one such l\, an error is declared. As with the previous step, the probability of 
error goes to zero as n —> oo since there are only 2 n ( I ( Vl ' Yl \ A ' U * > ~ e ' > V" sequences within each bin. 

• Decoder 1 then reconstructs x n as xii(ai(la),Ui(l a , lo),vii(l a , lo, li),yii) for i G [1 : n}. 



22 



Decoder 2: As the decoding steps for decoder 2 are similar to that for 1, we will only mention the differences 
here. That is, decoder 2 uses side information Y 2 instead of to perform the decoding operations and instead 
of decoding V{\ decoder 2 decodes V 2 . 

• Decoder 2 decodes V 2 . It does this by finding the unique l 2 such that v 2 (l a , lo/h) & $Vi {la, lo> 1^2)- If there 
is none or more than one such l 2 , an error is declared. As with the previous step, the probability of error goes 
to zero as n — > 00 since there are only 2™( / ( y2;Y2 l j4 ' C/ ) _e ) V 2 sequences within each bin. 

• Decoder 1 then reconstructs x n as x 2i (ai(l a ), Ui(l a , Iq), v 2 i(l a Jo, h), U2i) f° r i S [1 : n]. 

Distortion and cost constraints 

• For the cost constraint, since the chosen A n sequence is typical with high probability, E A(A n ) < C + e by 
the typical average lemma [5, Chapter 2]. 

• For the distortion constraints, since the probability of "error" goes to zero as n — > 00 and we are dealing only 
with finite cardinality random variables, following the analysis in [5, Chapter 3], we have 

n 

-Ed 2 (X n ,X2)<D 2 + e. 
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